Skip to content

[2.3.0] performance enhancements#60

Draft
TheAngryRaven wants to merge 12 commits into
masterfrom
BETA
Draft

[2.3.0] performance enhancements#60
TheAngryRaven wants to merge 12 commits into
masterfrom
BETA

Conversation

@TheAngryRaven

@TheAngryRaven TheAngryRaven commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Rollup of the BETA hardening work driven by the professional code review — fixing both criticals that weren't slipped, three HIGHs, plus CI supply-chain pinning. Five PRs merged so far: #59, #61, #62, #65, #66.

Security

SD card arbitration race — fixable from radio range (CR-1, #59). acquireSDAccess() was a non-atomic check-then-set on a shared flag, and five BLE commands (LIST, GET:, DELETE:, TLIST, TGET:) ran SdFat directly in the Bluefruit callback task, concurrent with 25 Hz session logging — overlapping commands could close a file mid-read, delete the file being streamed (DELETE: took no lock at all), or corrupt FAT outright. Now: lock transitions are atomic (FreeRTOS critical section; grant/deny table lives in the host-tested sd_access_policy unit), every SD-touching BLE command is deferred to BLUETOOTH_LOOP() so SdFat is single-task, listings hold the lock for the whole walk, and DELETE refuses while a transfer streams. Protocol impact: overlapping commands get the existing BUSY/TERR:BUSY replies instead of executing concurrently.

Fixed

Sleep mode actually sleeps (CR-4, #61). The tach ISR latched tachHavePeriod on the first engine pulse since boot and nothing cleared it, so every sleep entry instantly RPM-bounced back into race mode with logging enabled, draining the pack overnight. enterSleepMode() now calls TACH_SLEEP() to re-arm the wake trigger; a real engine pulse still wakes straight into race mode.

Lap times rendered wrong on the live pages (H-1/H-2, #61). Three pages appended millisecond zero-padding after the value (1:23.007 displayed as 1:23.700 — a 693 ms error), and the lap list did no padding at all. All six divergent inline copies of the ms→M:SS.mmm math were replaced by the host-tested lap_format unit (three zero-minutes styles preserve each page's layout; replay's already-correct rendering unchanged).

Track detection no longer throttles the main loop (H-3, #66). The O(N) software-double haversine manifest scan was gated on gpsData.fix (which stays true between PVT updates) and ran every ~250 Hz loop iteration. Now throttled to 1 Hz.

Tach ring can't be lapped during SD stalls (H-4, #66). The pulse ISR advanced head unconditionally; a documented 100 ms–2 s SD GC stall at racing RPM overran the 16-entry ring, breaking the SPSC invariant and logging confident-but-wrong RPM that also fed the >500 RPM auto-race trigger. The ISR now checks full, drops, and flags; TACH_LOOP() discards the one period spanning the gap.

GPS baud recovery can't trip the watchdog (H-5, #66). GPS_BAUD_RECOVERY() (up to three ~1.1 s probes + ~500 ms delays) could out-wait the 4 s WDT against a hung module — reset → recovery → reset boot loop. It now pets the WDT before each blocking probe.

Changed

CI pins DovesLapTimer per channel (#62 + #65). BETA-targeted builds track the library's BETA branch so the two beta channels move together; master CI and release/tag builds pin v4.1.0 instead of floating on the library's default-branch tip. compile-sketch selects the ref dynamically from the build's target branch, so the same file content is correct on both branches and nothing needs flipping at release time. (Master got the identical workflow commit directly via #63.)

Test infrastructure

Three new host-tested pure units per the project convention — sd_access_policy (arbitration decision table), lap_format (lap-time rendering incl. the exact H-1 regression case), and tach_filter (Kalman predict/update math, partial H-8) — all wired into the unit-test, coverage, and clang-tidy CI jobs. Host suite is now 118 test cases / 1595 assertions.

Known-remaining (deliberately not in this release)

  • Two slipped criticals, accepted as known issues for now.
  • H-6/H-7: BirdsEye.ino god-file decomposition and layering — needs module-boundary decisions first.
  • H-8 remainder: extracting the OTA protocol state machine and BLE command dispatcher to tested units — each deserves its own reviewed PR.

Full details per change in CHANGELOG.md [Unreleased], which rolls into 2.3.0 with this merge.

claude and others added 2 commits June 11, 2026 03:32
…k task

The SD access 'mutex' was a non-atomic check-then-set on a volatile int,
called from both the main loop and the Bluefruit callback task, and several
BLE commands ran SdFat directly in the callback:

- LIST/TLIST iterated directories for seconds while only peeking the flag
- GET:/TGET: started transfers in the callback, closing a file the main
  loop could be mid-read() on
- DELETE: took no lock at all and could remove the file being streamed

Any of these could interleave with 25 Hz session logging: FAT corruption,
lost logs, SPI wedge — triggerable from radio range.

Fixes:
- acquireSDAccess()/releaseSDAccess() now commit state transitions inside
  a FreeRTOS critical section (taskENTER_CRITICAL — BASEPRI-masked, so
  SoftDevice radio interrupts are unaffected).
- The grant/deny decision table is extracted into the host-tested
  sd_access_policy pure unit (single source of truth for the SD_ACCESS_*
  values; covered by tests/sd_access_policy_test.cpp, wired into the
  unit-test, coverage, and clang-tidy CI jobs).
- LIST/GET:/DELETE:/TLIST/TGET: are deferred to BLUETOOTH_LOOP() via a
  one-deep command buffer, like the settings/track/OTA commands already
  were — SdFat is never touched from the callback task. A second command
  while one is queued gets the protocol's BUSY / TERR:BUSY reply.
- Directory listings hold the SD lock for the whole walk; bleDeleteFile()
  takes the lock and refuses with BUSY while a transfer is streaming (the
  idempotent same-mode re-acquire would otherwise let it delete the
  streamed file and then drop the transfer's lock).
- Queued commands are dropped on disconnect/BLE_STOP so a stale command
  can't execute on the next session.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ion-mhs8yd

Fix SD arbitration race (CR-1): atomic acquire + no SdFat in the BLE callback task
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Coverage — host-testable units

📂 Overall coverage

Metric Coverage
Lines 🟢 255/256 (99.6%)
Functions 🟢 26/26 (100.0%)
Branches 🟡 233/267 (87.3%)

📄 File coverage

File Lines Functions Branches
BirdsEye/crc32.cpp 🟢 30/30 (100.0%) 🟢 4/4 (100.0%) 🟢 24/24 (100.0%)
BirdsEye/dovex_header.cpp 🟢 96/97 (99.0%) 🟢 6/6 (100.0%) 🔴 57/88 (64.8%)
BirdsEye/filename_validator.cpp 🟢 14/14 (100.0%) 🟢 1/1 (100.0%) 🟢 30/30 (100.0%)
BirdsEye/gps_time.cpp 🟢 45/45 (100.0%) 🟢 6/6 (100.0%) 🟢 30/32 (93.8%)
BirdsEye/gps_validation.cpp 🟢 24/24 (100.0%) 🟢 2/2 (100.0%) 🟢 66/66 (100.0%)
BirdsEye/haversine.cpp 🟢 8/8 (100.0%) 🟢 1/1 (100.0%) ⚫ 0/0 (0.0%)
BirdsEye/lap_format.cpp 🟢 18/18 (100.0%) 🟢 1/1 (100.0%) 🟢 9/9 (100.0%)
BirdsEye/sd_access_policy.cpp 🟢 5/5 (100.0%) 🟢 2/2 (100.0%) 🟢 10/10 (100.0%)
BirdsEye/tach_filter.cpp 🟢 15/15 (100.0%) 🟢 3/3 (100.0%) 🟡 7/8 (87.5%)

@TheAngryRaven TheAngryRaven changed the title [2.4.0] performance enhancements [2.3.0] performance enhancements Jun 11, 2026
claude and others added 10 commits June 11, 2026 03:50
CR-4: Sleep mode was permanently defeated after the first engine pulse.
The tach ISR set tachHavePeriod and nothing ever cleared it, so every
sleep entry (long-press, 5-min menu idle, USB) read the stale flag and
instantly bounced through exitSleepMode(true) back into race mode with
logging enabled — silently starting a session and draining the pack
overnight. New TACH_SLEEP() (called from enterSleepMode()) re-arms the
wake trigger and drops stale ring-buffer/Kalman state so the wake's RPM
computation starts clean; the ISR stays attached and the next real pulse
still RPM-wakes straight into race mode as designed.

H-1/H-2: Lap-time formatting was six divergent inline copies of the
60000/%1000 math with four padding behaviors. Three live pages
(current/best/optimal lap) appended the millisecond zero-padding AFTER
the value — 1:23.007 rendered as 1:23.700, a 693 ms error — and the
lap-history list did no padding at all (1:5.7). All six sites now call
one host-tested pure unit, lap_format::formatLapTime() (ms always three
zero-padded digits), with three zero-minutes styles preserving each
page's layout: kOmit (replay results, already-correct rendering
unchanged), kShow (lap list, now a proper 0:SS.mmm fixed field), and
kSpace (big-font live pages, column-stable decimal point). Covered by
tests/lap_format_test.cpp including the exact H-1 regression case;
wired into the unit-test, coverage, and clang-tidy CI jobs.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ormat-mhs8yd

Fix sleep-mode RPM-wake latch (CR-4) and lap-time rendering (H-1/H-2)
…aster/release pin v2.3.1

beta.yml installs the lap-timer library from its BETA branch so the two
beta channels move together. release.yml pins the known-good v2.3.1 tag
instead of floating on the library's default-branch tip. compile-sketch
picks the ref dynamically from the build's target branch (pull_request
base_ref or push/dispatch ref_name): BETA-targeted builds use the
library's BETA branch, everything else uses v2.3.1 — one file content
works unchanged on both branches, so BETA→master merges stay clean and
nothing needs flipping at release time.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
CI: DovesLapTimer per channel — BETA builds track library BETA, master/release pin v2.3.1
…0, not v2.3.1

v2.3.1 doesn't exist in the DovesLapTimer repo, so the library checkout
failed ('pathspec v2.3.1 did not match any file(s)') and the
compile-sketch jobs went down before compiling anything. Docs updated to
match.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…llowup-mhs8yd

CI: fix DovesLapTimer pin to v4.1.0 — #62 merged before the correction landed
…ery (H-3/H-4/H-5, partial H-8)

H-3: trackDetectionLoop() gated only on gpsData.fix, which stays true
between PVT updates, so the O(N) software-double haversine manifest scan
(several ms at the 200-entry ceiling on the single-precision-FPU M4F)
ran every ~250 Hz loop iteration, collapsing the loop rate. Throttled to
1 Hz — still instant at driving pace.

H-4: the tach pulse ISR advanced the ring head unconditionally. An SD GC
stall (documented 100 ms–2 s — the same failure the GPS serial ring
exists for) at racing RPM overruns the 16-entry buffer; lapping the
consumer breaks the SPSC invariant and TACH_LOOP computes a
confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM
auto-race trigger. The ISR now does the SPSC full check (one slot
sacrificed, matching the GPS serial ISR), drops the pulse, and sets
tachRingOverflow; TACH_LOOP discards the carried prev-timestamp so the
one period spanning the gap is never computed and the Kalman estimate
coasts. tachRingTail becomes volatile (the ISR reads it now); the wake
trigger still fires on dropped pulses.

H-5: GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed ~4 s WDT
but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms
of delays — a genuinely hung module out-waits the watchdog and the one
path built to recover a sick GPS becomes a reset/recovery boot loop. It
now pets the WDT before each blocking probe (same treatment
fwStageToFlash() already had).

H-8 (partial): the tach Kalman filter math (predict/update, Q/R/floor
tuning constants, period→RPM conversion) is extracted to the host-tested
tach_filter pure unit per the project convention, with tests covering
convergence, monotonic approach, R-scaling with period count, the
uncertainty floor, and reset semantics. The OTA protocol state machine
and BLE command dispatcher extractions are deliberately deferred — they
are large refactors that deserve their own PRs.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
…ening-mhs8yd

Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants